KAFKA-20617: add validation for cluster-id in Formatter#22384
KAFKA-20617: add validation for cluster-id in Formatter#22384gaurav-narula wants to merge 3 commits into
Conversation
This change adds validation for Uuids in Formatter which is used by `kafka-storage.sh` which allows users to fail fast before they inadvertently end up using an invalid cluster-id.
| if (clusterId.contains("=")) { | ||
| throw new FormatterException("The specified cluster id, " + clusterId + " contains padding and is invalid"); |
There was a problem hiding this comment.
Our randomUuid method is possible to output the uuid contains = sign:
This will not generate a UUID equal to 0, 1, or one whose string representation starts with a dash ("-")
So is this the expected validation?
There was a problem hiding this comment.
I don't think it is possible because Uuid#toString does not use padding and = sign can only be present when padding is enabled
There was a problem hiding this comment.
I think this actually covers a real gap. The one in which a CID like igNUVIdeSPO5JCZYFhOh7Q== would pass the validation, but would be returned as igNUVIdeSPO5JCZYFhOh7Q by Uuid.toString(). I would just move it outside the try-catch block as it throws FormatterException directly and would bypass the catch anyway.
| Uuid uuid = Uuid.fromString(clusterId); | ||
| if (Uuid.RESERVED.contains(uuid)) { | ||
| throw new FormatterException("The specified cluster id, " + clusterId + " is reserved"); | ||
| } |
There was a problem hiding this comment.
Let's also validate the starting - sign.
There was a problem hiding this comment.
I think these are cosmetic improvemenst that've been added over time but have no bearing on correctness. Failing validation on them would hinder migration from older clusters where - was allowed.
The motivation for avoiding - in the beginning was to avoid shell escaping issues when passing cluster id in CLI tools. https://issues.apache.org/jira/browse/KAFKA-13741
Lately, https://issues.apache.org/jira/browse/KAFKA-20072 avoided - altogether to allow easier copy-pasting.
There was a problem hiding this comment.
Good point. Let's add a comment about this for future reference.
fvaleri
left a comment
There was a problem hiding this comment.
@gaurav-narula good catch. Left some comments.
|
|
||
| @ParameterizedTest | ||
| @ValueSource(strings = {"unrvTtQISjar0JUWGU/8Pg", "igNUVIdeSPO5JCZYFhOh7Q==", "AAAAAAAAAAAAAAAAAAAAAA", "AAAAAAAAAAAAAAAAAAAAAQ"}) | ||
| public void testFormatWithInvalidClusterId(String clusterId) throws Exception { |
There was a problem hiding this comment.
There's no corresponding positive test showing that a valid CID passes validation. This would guard against the validation being too aggressive.
| } | ||
| try { | ||
| if (clusterId.contains("=")) { | ||
| throw new FormatterException("The specified cluster id, " + clusterId + " contains padding and is invalid"); |
There was a problem hiding this comment.
| throw new FormatterException("The specified cluster id, " + clusterId + " contains padding and is invalid"); | |
| throw new FormatterException("The specified cluster id, " + clusterId + " is invalid: contains padding"); |
| assertEquals(expectedPrefix, | ||
| assertThrows(FormatterException.class, | ||
| formatter1.formatter::run). | ||
| getMessage().substring(0, expectedPrefix.length())); |
There was a problem hiding this comment.
Wouldn't be easier to rewrite like:
assertTrue(message.startsWith(expectedPrefix))
There was a problem hiding this comment.
I'm indifferent here - I followed the convention used in another test in the same class
| if (clusterId.contains("=")) { | ||
| throw new FormatterException("The specified cluster id, " + clusterId + " contains padding and is invalid"); |
There was a problem hiding this comment.
I think this actually covers a real gap. The one in which a CID like igNUVIdeSPO5JCZYFhOh7Q== would pass the validation, but would be returned as igNUVIdeSPO5JCZYFhOh7Q by Uuid.toString(). I would just move it outside the try-catch block as it throws FormatterException directly and would bypass the catch anyway.
| Uuid uuid = Uuid.fromString(clusterId); | ||
| if (Uuid.RESERVED.contains(uuid)) { | ||
| throw new FormatterException("The specified cluster id, " + clusterId + " is reserved"); | ||
| } |
There was a problem hiding this comment.
Good point. Let's add a comment about this for future reference.
cd317eb to
2a70079
Compare
| /** | ||
| * Validates the correctness of the given cluster id. A valid cluster id is a base64, urlencoded, no padding | ||
| * representation of a {@link Uuid}. These checks do not validate the absence of <code>-</code> character as | ||
| * {@link Uuid#randomUuid()} avoids them only for convenience reasons. |
There was a problem hiding this comment.
I would explicitly mention support for already generated CIDs. Basically it's for historical reasons.
This change adds validation for Uuids in Formatter which is used by
kafka-storage.shwhich allows users to fail fast before they inadvertently end up using an invalid cluster-id.